CORE: Nonparametric Clustering of Large Numeric Databases

نویسندگان

Andrej Taliun

Michael H. Böhlen

Arturas Mazeika

چکیده

Current clustering techniques are able to identify arbitrarily shaped clusters in the presence of noise, but depend on carefully chosen model parameters. The choice of model parameters is difficult: it depends on the data and the clustering technique at hand, and finding good model parameters often requires time consuming human interaction. In this paper we propose CORE, a new nonparametric clustering technique that explicitly computes the local maxima of the density and represents them with cores. CORE proposes an adaptive grid and gradients to define and compute the cores of clusters. The incrementally constructed adaptive grid and the gradients make the identification of cores robust, scalable, and independent of small density fluctuations. Our experimental studies show that CORE without any carefully chosen model parameters produces better quality clustering than related techniques and is efficient for large datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Large Databases with Numeric and Nominal Values Using Orthogonal Projections

Clustering large high-dimensional databases has emerged as a challenging research area. A number of recently developed clustering algorithms have focused on overcoming either the “curse of dimensionality” or the scalability problems associated with large amounts of data. The majority of these algorithms operate only on numeric data, a few handle nominal data, and very few can deal with both num...

متن کامل

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

The problem of detecting clusters of points belonging to a spatial point process arises in many applications. In this paper , we introduce the new clustering algorithm DBCLASD (Distribution Based Clustering of LArge Spatial Databases) to discover clusters of this type. The results of experiments demonstrate that DBCLASD, contrary to partitioning algorithms such as CLARANS, discovers clusters of...

متن کامل

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...

متن کامل

CRAFT: ClusteR-specific Assorted Feature selecTion

We present a framework for clustering with cluster-specific feature selection. The framework, CRAFT, is derived from asymptotic log posterior formulations of nonparametric MAP-based clustering models. CRAFT handles assorted data, i.e., both numeric and categorical data, and the underlying objective functions are intuitively appealing. The resulting algorithm is simple to implement and scales ni...

متن کامل

Clustering Categorical Data with k-Modes

A lot of data in real world databases are categorical. For example, gender, profession, position, and hobby of customers are usually defined as categorical attributes in the CUSTOMER table. Each categorical attribute is represented with a small set of unique categorical values such as {Female, Male} for the gender attribute. Unlike numeric data, categorical values are discrete and unordered. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

CORE: Nonparametric Clustering of Large Numeric Databases

نویسندگان

چکیده

منابع مشابه

Clustering Large Databases with Numeric and Nominal Values Using Orthogonal Projections

A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

CRAFT: ClusteR-specific Assorted Feature selecTion

Clustering Categorical Data with k-Modes

عنوان ژورنال:

اشتراک گذاری